Search CORE

163 research outputs found

Foreword to the Special Issue: "Semantics for Big Data Integration"

Author: Beneventano Domenico
Vincini Maurizio
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In recent years, a great deal of interest has been shown toward big data. Much of the work on big data has focused on volume and velocity in order to consider dataset size. Indeed, the problems of variety, velocity, and veracity are equally important in dealing with the heterogeneity, diversity, and complexity of data, where semantic technologies can be explored to deal with these issues. This Special Issue aims at discussing emerging approaches from academic and industrial stakeholders for disseminating innovative solutions that explore how big data can leverage semantics, for example, by examining the challenges and opportunities arising from adapting and transferring semantic technologies to the big data context

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Managing the Process of Segmentation on the Mobile Phone Subscribers

Author: BENEVENTANO Domenico
Rodrigue Carlos Nana Mbinkeu
Publication venue: 'International Association of Online Engineering (IAOE)'
Publication date: 01/01/2015
Field of study

Most telecommunications providers possess a remarkable amount of data about their subscribers.The knowledge that we would discover in the database of telecommunications providers is vital to understanding the behavior of subscribers. We talk about subscribers segmentation. The segmentation will identify and select the subscribers most likely to respond favorably to offers. Our paper proposes a set of techniques to analyze and design tools that manages the process of data acquisition, data cleaning and selection of the segmentation algorithm

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Online-Journals.org (International Association of Online Engineering)

Semantic Integration of heterogeneous data sources in the MOMIS Data Transformation System

Author: Beneventano Domenico
Bergamaschi Sonia
Vincini Maurizio
Publication venue
Publication date: 01/01/2013
Field of study

In the last twenty years, many data integration systems following a classical wrapper/mediator architecture and providing a Global Virtual Schema (a.k.a. Global Virtual View - GVV) have been proposed by the research community. The main issues faced by these approaches range from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. Despite the research effort, all the approaches proposed require a lot of user intervention for customizing and managing the data integration and reconciliation tasks. In some cases, the effort and the complexity of the task is huge, since it requires the development of specific programming codes. Unfortunately, due to the specificity to be addressed, application codes and solutions are not frequently reusable in other domains. For this reason, the Lowell Report 2005 has provided the guideline for the definition of a public benchmark for information integration problem. The proposal, called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches), focuses on how the data integration systems manage syntactic and semantic heterogeneities, which definitely are the greatest technical challenges in the field. We developed a Data Transformation System (DTS) that supports data transformation functions and produces query translation in order to push down to the sources the execution. Our DTS is based on MOMIS, a mediator-based data integration system that our research group is developing and supporting since 1999. In this paper, we show how the DTS is able to solve all the twelve queries of the THALIA benchmark by using a simple combination of declarative translation functions already available in the standard SQL language. We think that this is a remarkable result, mainly for two reasons: firstly to the best of our knowledge there is no system that has provided a complete answer to the benchmark, secondly, our queries does not require any overhead of new code

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Lexical Knowledge Extraction: an Effective Approach to Schema and Ontology Matching

Author: BENEVENTANO Domenico
BERGAMASCHI Sonia
PO Laura
SORRENTINO Serena
Publication venue: place:non disponibile
Publication date: 01/01/2009
Field of study

This paper’s aim is to examine what role Lexical Knowledge Extraction plays in data integration as well as ontology engineering.Data integration is the problem of combining data residing at distributed heterogeneous sources, and providing the user with a unified view of these data; a common and important scenario in data integration are structured or semi-structure data sources described by a schema.Ontology engineering is a subfield of knowledge engineering that studies the methodologies for building and maintaining ontologies. Ontology engineering offers a direction towards solving the interoperability problems brought about by semantic obstacles, such as the obstacles related to the definitions of business terms and software classes. In these contexts where users are confronted with heterogeneous information it is crucial the support of matching techniques. Matching techniques aim at finding correspondences between semantically related entities of different schemata/ontologies.Several matching techniques have been proposed in the literature based on different approaches, often derived from other fields, such as text similarity, graph comparison and machine learning.This paper proposes a matching technique based on Lexical Knowledge Extraction: first, an Automatic Lexical Annotation of schemata/ontologies is performed, then lexical relationships are extracted based on such annotations.Lexical Annotation is a piece of information added in a document (book, online record, video, or other data), that refers to a semantic resource such as WordNet. Each annotation has the property to own one or more lexical descriptions. Lexical annotation is performed by the Probabilistic Word Sense Disambiguation (PWSD) method that combines several disambiguation algorithms.Our hypothesis is that performing lexical annotation of elements (e.g. classes and properties/attributes) of schemata/ontologies makes the system able to automatically extract the lexical knowledge that is implicit in a schema/ontology and then to derive lexical relationships between the elements of a schema/ontology or among elements of different schemata/ontologies.The effectiveness of the method presented in this paper has been proven within the data integration system MOMIS

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Automatic Normalization and Annotation for Discovering Semantic Mappings

Author: BENEVENTANO Domenico
BERGAMASCHI Sonia
PO Laura
SORRENTINO Serena
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

SparkER: Scaling Entity Resolution in Spark

Author: Domenico Beneventano
Giovanni Simonini
Luca Gagliardelli
Sonia Bergamaschi
Publication venue: country:PRT
Publication date: 01/01/2019
Field of study

We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Entity Resolution and Data Fusion: An Integrated Approach

Author: Domenico Beneventano
Giovanni Simonini
Luca Gagliardelli
Sonia Bergamaschi
Publication venue: country:ITA
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

An Ontology-Based Data Integration System for Data and Multimedia Sources

Author: BENEVENTANO Domenico
ORSINI Mirko
PO Laura
SALA Antonio
SORRENTINO Serena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Data integration is the problem of combining data residing at distributed heterogeneous sources, including multimedia sources, and providing the user with a unified view of these data. Ontology based Data Integration involves the use of ontology(s) to effectively combine data and information from multiple heterogeneous sources [16]. Ontologies, with respect to the integration of data sources, can be used for the identification and association of semantically correspond- ing information concepts, i.e. for the definition of semantic mappings among concepts of the information sources. MOMIS is a Data Integration System which performs in-formation extraction and integration from both structured and semi- structured data sources [6]. In [5] MOMIS was extended to manage “traditional” and “multimedia” data sources at the same time. STASIS is a comprehensive application suite which allows enterprises to simplify the mapping process between data schemas based on semantics [1]. Moreover, in STASIS, a general framework to perform Ontology-driven Semantic Mapping has been pro-posed [7]. This paper describes the early effort to combine the MOMIS and the STASIS frameworks in order to obtain an effective approach for Ontology-Based Data Integration for data and multimedia sources

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

The Case for Multi-task Active Learning Entity Resolution

Author: Domenico Beneventano
Giovanni Simonini
Henrique Saccani
Luca Gagliardelli
Luca Zecchini
Sonia Bergamaschi.
Publication venue
Publication date: 01/01/2021
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Big Data Integration for Data-Centric AI

Author: Aslam Adeel
Beneventano Domenico
Bergamaschi Sonia
De Sabbata Giulio
Gagliardelli Luca
Simonini Giovanni
Zecchini Luca
Publication venue
Publication date: 01/01/2022
Field of study

Big data integration represents one of the main challenges for the use of techniques and tools based on Artificial Intelligence (AI) in several crucial areas: eHealth, energy management, enterprise data, etc. In this context, Data-Centric AI plays a primary role in guaranteeing the quality of the data on which these tools and techniques operate. Thus, the activities of the Database Research Group (DBGroup) of the “Enzo Ferrari” Engineering Department of the University of Modena and Reggio Emilia are moving in this direction. Therefore, we present the main research projects of the DBGroup, which are part of collaborations in various application sectors

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia